Arm backend: Make per-channel quantization default#11873
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11873
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit cfaa79f with merge base 2bd96df ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label ciflow/trunk |
|
@pytorchbot label "partner: arm" |
|
@pytorchbot label "topic: not user facing" |
|
@martinlsm no need to add |
digantdesai
left a comment
There was a problem hiding this comment.
I assume rationale is accuracy but can you please add more details? And linear and conv weights only right? perf impact? I also didn't see any ATOL/RTOL change in this diff.
| "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", | ||
| "quantizer = EthosUQuantizer(compile_spec)\n", | ||
| "operator_config = get_symmetric_quantization_config(is_per_channel=False)\n", | ||
| "operator_config = get_symmetric_quantization_config(is_per_channel=True)\n", |
There was a problem hiding this comment.
Nit
| "operator_config = get_symmetric_quantization_config(is_per_channel=True)\n", | |
| "operator_config = get_symmetric_quantization_config()\n", |
There was a problem hiding this comment.
@digantdesai I have resolved your code comment and answered all your questions in the updated commit message.
8175bb3 to
c3567b4
Compare
digantdesai
left a comment
There was a problem hiding this comment.
I am glad to see we didn't have to mess with the backend code at all when quantizing the weights differently. Thanks @martinlsm.
@digantdesai we did mess with the backend code but in a separate PR #11752. |
fb40b0a to
4dd805c
Compare
Support for per-channel quantization was recently added to the Arm backend. This patch changes the default setting to use per-channel quantization for weights in convolutional and linear layers, instead of per-tensor quantization, which was the previous default. The reason for this change is that per-channel quantization offers better numerical accuracy for models containing convolutional and/or fully connected layers. Unless there is an explicit limitation in the use case that prevents the use of per-channel quantization, it is generally preferred. The option to set quantization granularity can still be manually set using `get_symmetric_quantization_config(is_per_channel=False)`. This patch only changes the default. Unit and model tests are affected by this change. Error tolerances for those tests have not been changed, as model outputs are compared against a reference that uses the exact same quantization strategy. That is, if a model output is altered by this patch, the reference it is compared against would also be altered accordingly. To verify the impact of this change in terms of top-1 and top-5 accuracy, a manual test was run on MobileNetV2. The results show a noticeable improvement: - Per-tensor quantization Top-1 / Top-5 accuracy: 66.45% / 87.50% - Per-channel quantization Top-1 / Top-5 accuracy: 70.85% / 89.50% Change-Id: I35d5c62741c7f93b916560874689245db96a588b Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com>
Previously we were just a few minutes off the 90 minute timeout. With per-channel quantizaiton enabled by defualt it seems that we exceed that limit consistently. This patch increases the timeout to 120 minutes. Change-Id: I20f3fb369329dd51e95ffec667617afe93c50aa3 Signed-off-by: Oscar Andersson <oscar.andersson@arm.com>
4dd805c to
07c2ba7
Compare
|
LOL That's what I thought :D Are we ready to merge this one then? |
Sebastian-Larsson
left a comment
There was a problem hiding this comment.
Unrelated CI failures
Support for per-channel quantization was recently added to the Arm
backend. This patch changes the default setting to use per-channel
quantization for weights in convolutional and linear layers, instead of
per-tensor quantization, which was the previous default.
The reason for this change is that per-channel quantization offers
better numerical accuracy for models containing convolutional and/or
fully connected layers. Unless there is an explicit limitation in the
use case that prevents the use of per-channel quantization, it is
generally preferred.
The option to set quantization granularity can still be manually set
using
get_symmetric_quantization_config(is_per_channel=False). Thispatch only changes the default.
Unit and model tests are affected by this change. Error tolerances for
those tests have not been changed, as model outputs are compared against
a reference that uses the exact same quantization strategy. That is, if
a model output is altered by this patch, the reference it is compared
against would also be altered accordingly.
To verify the impact of this change in terms of top-1 and top-5
accuracy, a manual test was run on MobileNetV2. The results show a
noticeable improvement:
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218